RMSProp is an optimizer intended to address AdaGrad’s tendency to shrink the learning rate to zero. RMSProp adds an additional parameter to AdaGrad that effectively changes the state vector from a sum of past states to an exponential moving average of them. Larger values of extend the number of previous states that make a non-negligible contribution.

Recall that AdaGrad can be written as

To this, RMSProp adds scaling terms and to the first and second factors of respectively:

The update rule for remains the same as in AdaGrad:

where again is the element-wise product.